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Abstract: 


Recommendation system is an important type of machine 
learning algorithm that provide precise suggestions to the 
users. Recommendation systems are used in innumerable 
types of areas such as generation of playlists, music and 
video services like Jio savaan, wynk, amazon prime music 
etc., and products recommendation for users in e-commerce 
applications and commercial applications. The 
recommendations that are provided by various types of 
applications increases the speed for identifying and makes 
easier to access the products that users are interested in. For 
each user, the recommendation system is capable of 
envisaging the future predilections on a set of items and 
recommend the top items. In several industries, 
recommendation systems are very useful as they generate 
huge amount of income and this type of industries can stand 
uniquely from competitors. Due to cumbersome number of 
items that each user can find in the web, the impact of 
recommendation system has been increased in the internet. 
Recommendation systems are used for custom-made 
navigation by getting huge amount of data particularly in 
social media domain for recommending friends. A 
recommendation system act as a subclass for the 
information filtering system that pursue to predict the 
rating. The similarity measures that are calculated in this 
and Otsuka-Ochiai 


coefficient. The feature extractions that are used in this 


research are Jaccard distance 


paper are Adar index, PageRank, Katz centrality, Hits 


score. Now a days many 


57 


research people are implementing different types of 


algorithms in various domains for recommendation systems. 
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system. 
LINTRODUCTION 


The recommendation system is used to recommend the 
user based on their preferences. In recent times social 
media is enjoying a great deal of success with a million of 
users visiting many sites like Facebook, Twitter etc. for 
social networking. By using computational methods such 
as natural language processing, data mining machine 
learning etc., social recommendation system involves the 
investigation of collective intelligence of data from wikis, 
query locks, Q&A communities etc. The information that 
is very much interested by the users is suggested by the 
recommendation system by using information filtering 
techniques. Social recommendation system is a system 
that recommends the friends in social media applications 


such as the Facebook, Twitter, Instagram etc. 


Over the last couple of years, for social media 
personalized recommendation systems are came into 
existence. For example, StumbleUpon is a customized 


recommendation system which suggests web pages for 
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the users based on the ratings given by the users, rating 
given by the user of users with similar interests and 
topics. It recommends the friends to the users by knowing 
the followers and followee of a particular user. Using a 
dataset of 9437519 nodes of both source and destination, 
in that entire volume of data 80% is used for training and 
20% is used for testing for future predictions. In this 
model the classification algorithm used is Random Forest 
Classifier. Random forest algorithm is a tree based 
algorithm, so it will works well for dimensional data and 
nonlinear separable data. Accuracy score is calculated by 
the precision, recall and F1 score. For describing the 
performance of the test data in a classification model 


confusion matrix is used. 


Il. STATE OF ART 


1).Snigdha Luthra et.al (2019) approach the social hub 
network is dynamic because it changes the structure at 
completely different timestamps. The network obtained at 
time t is varied at time t+1. so as to predict the continuing 
changes on network, graph embedded techniques square 
measure won’t to acquire associate unattended graph with 
completely different parameters of nodes and edges which 
might be utilized in machine learning ways. For this 
experiment community detection, formula performs bunch 
technique to cluster nodes along in the same cluster that 
has a similar edge betweenness issue. The projected 
framework decides that more connections may be 


established supported nodes happiness to a similar cluster. 


2). Ivana Andjelkovic et.al (2019) proposed a 


recommendation system for musical artists which 


acquaint with a novel collaborating visualization of 
moods and the artist. Within the visualization via 
manipulation of the avatar the system provisions control 
and recommendation 


explanation of the system. 


Implementation and design of an online experiment is 


58 


obtainable to estimate the system through four conditions 
with interaction, control and unpredictable degrees of 
visualization. The results has shown that a certain 
combination of cooperative features and interface design 
progresses perceived and objective recommendation 
accuracy. As there is self-conveyed user fulfillment with 
the recommendation system which it makes the people to 
know the mood the artist’s music which combined with 


the relevant interactivity in the music recommendation, 


can change the way for the accuracy of the 
recommendation. 
3).ImaneBelkhadir et.al (2019) presented an 


amalgamation of social regularization approach that 
incorporates the trust information and social network 
information to categorize a comprehensive trust path in 
social graph. The proposed recommendation system 
recommends the friends to the users based on the users 
who having the similar favors and tastes and recommends 
the experts to users in some field. Based on these 
conditions the system proposes matrix factorization 
frame work. The correlation between the users and items 
and the shortest path is calculated between the 
appropriate groups of friends that are huddled to get an 
accurate friends’ recommendation. To restrict the 
framework of matrix factorization, tags and friendships 


are joined as the regularization terms. 


4). Neha Verma et.al (2019) proposed a recommendation 
system to understand the user in the e-commerce 
websites like Flipkart, Amazon, and Netflix etc. The 
work flow of the recommendation system is divided into 
two segments such as the gathering information segment 
and the analysis segment. The proposed system builds the 
recommendation system using various techniques like 
traditional techniques, Hybrid techniques and the modern 


techniques and the information of the users is collected 
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from the buying, searching and habits etc. The collected 
information then used for analysis segment. Based on this 
analysis about the user and the recommendations are 


generated. 


5). Swathi Sambangi et.al (2019) proposed a 
recommendation system for the user to choose the 
customized products of the users. The primary goal the 
proposed system is to endorse a one to one text-based 
review recommendation. The proposed system takes the 
data that was obtained from the Amazon API. The 
amazon recommendation system is one of the effective 
technologies that has a huge accomplishment on the 
information available on the internet. This technology 
supports the users to choose the customized product 


which increases the development rate of user satisfaction. 


5)Peng Liu et.al (2019) dynamic graph based embedding 
model is proposed a real time social recommendation. 
The dynamic graph based embedding model (DGE) takes 
the user behaviour pattern, social relationships and 
temporal semantic effects in an amalgamated way. To 
take the semantic effect of the edges, the probability 
matrix is formulated. To generate a top recommendation 
in large scale social media, an incremental learning 
algorithm and query processing techniques are 
developed. The proposed recommendation system is 
based on the contiguity of related users and items 
although considering the sparkle of the items. Estimation 
of the toe real world datasets demonstrate the efficiency 


of the proposed approach. 


IW.ARCHITECTURE OF PROPOSED SYSTEM 

A). DATASET OVERVIEW 

Dataset is taken from the Facebook's recruiting challenge 
on Kaggle and the dataset comprehends two columns such 


as Source and Destination. Dataset link - 


https://www.kaggle.com/c/FacebookRecruiting. 


The volume the of dataset is approximately 94Lakhs, later 
on the data is fragmented into two types such as training 
data and testing data. 830% and 20% of the data is used for 
training and testing the data respectively. 
Total no of nodes presents in the data: 1862220 
Total no of edges presents in the data: 9437519 
Data Columns Data type 

Source Node int 64 

Destination Node int 64 
The number of people that are common in both test and 
train data are: 1063125 
The number of people that are not present in test data but 
present in train data are: 717597 
The number of people that are not present in train data but 


present in test data are: 81498 


Calculating Precision, Recall k ee 


and F} Score =| Random Forest Classifier is used [C7 


FigA. Process for proposed Architecture. 


B). MAPPING THE PROBLEM INTO SUPERVISED 
LEARNING PROBLEM 

Generated training samples of good and bad links from 
given directed graph and for each link got some features 
like number of followers, is he followed back, page rank, 
katz score, adar index, some Svd features of adjacent 
matrix, some weight features etc. and trained ml model 


based on these features to predict link. 
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C). IN-DEGREE — 

The number of edges directed into a vertex in a directed 
graph is called in-degree. In other words, it can be defined 
as number of incoming nodes for a particular node. In- 
Degree is used to find the number of followers for each 
user 


The average In-Degree for the total data: 5.0679 


D). OUT-DEGREE — 

The number of edges directed away from the vertex in a 
directed graph is called out-degree.In other words, it can 
be defined as number of outgoing nodes for a particular 
node. Out-Degree is used to find the number of people 
each user is following. 


The average Out-Degree for the total data: 5.0679 


E). JACCARD DISTANCE — 


aan a 
aera 


FF = 


Formula E)— Jaccard Distance 


Jaccard distance is nothing but a measure of similarity 
between two data nodes ranging from 0% to 100%. As the 


Jaccard distance increases then there is a high chance of 


existing edge between the two nodes. It is defined as the 
size of the intersection of two sets to the size of the union 
of the two sets. It’s very sensitive towards small amount of 


data and gives flawed results. 


F). COSINE-DISTANCE(OTSUKA—OCHIAI 
COEFFICIENT) 
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|IAMNB| 
|A| x |B] 


Formula F) — Otsuka-Ochiai coefficient 
Otsuka-Ochiai coefficient is nothing but an intersection of 
the no of elements to the square root of the no of elements 


in A multiplied by number of elements in B. 


G). PAGE RANK — 

PageRank is an algorithm that was designed to rank the 
importance of web pages. Given a directed graph the 
PageRank algorithm will give each vertex (Uj ) a score. 
The score represents the importance of the vertex in the 
directed graph. Networkx library is used to compute the 
PageRank. 


H).CHECKING FOR SAME WEAKLY CONNECTED 
COMPONENTS- 

If two users belonging to the same weakly connected 
components that gives the higher probability or higher 
chance of similar edge being present. Weakly connected 
component acts as a subgraph for the given directed graph. 
Weakly connected component acts as a strongly connected 


component in case of undirected graph. 
I).ADAR INDEX- 


1 


A(z, y) = E log(|N (u)|) 


Formula I) — Adar Index 


Adar index or Adamic index is nothing but an inverted 
sum of degrees of common neighbors for given two 
vertices. Networkx library is used to compute the Adar 


index. 
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J). KATZ CENTRALITY — 

Based on the centrality of its neighbors Katz Centrality 
computes the centrality of a node. It is an inductive 
reasoning of the eigenvector centrality. The Katz centrality 


for node i is 
ts = Q > AjjX; + £, 
3 


Formula J) — Katz Centrality 
Where A is the adjacency matrix of the graph G with 
eigenvalues à. Networkx library is used to compute the 


Katz centrality. 


K). HITS SCORE — 
The HITS(Hyper Induced Topic Search) algorithm 
computes two no for a node. Based on the incoming links 
authorities estimates the node value. Based on outgoing 
links hubs estimates the node value. Networkx library is 


used to compute the HITS score. 


L). WEIGHT FEATURES — 

An edge weight value was calculated between nodes in 
order to find the similarity of nodes. As the neighbor count 
goes up edge weight decreases. Intuitively, consider one 
million people following a celebrity on a social network 
then chances are most of them never met each other or the 
celebrity. Whereas on the other hand, if a user has 30 
contacts in his/her social network, the chances are higher 
that most of them know each other. 

As it is directed graph, weighted in and weighted out are 


calculated separately. 


wm — 2 
Sf/i + LX 


Formula L) — Weight Features 
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M). SINGULAR VALUE DECOMPOSITION — 

For factorization of matrix into singular values and 
singular vectors SVD (Singular Value Decomposition) 
algorithm is used. SVD features for both source and 
destination nodes SVD is widely used in machine learning 


reduction techniques and matrix calculations. 


IV. RESULTS AND DISCUSSION 

1) SUBGRAPH 
The below figure 5.1, represents the directed graph 
obtained from the dataset. By this digraph number of 


edges and nodes are calculated. 


Fig Subgraph 


The no of nodes in the subgraph are: 66 

The no of edges in the subgraph are: 50 

The average In-Degree of the above subgraph: 0.7576 
The average Out-Degree of the above subgraph: 0.7576 


2). IN-DEGREE GRAPH 


The below figure 5.2, represents the graph of In-Degree. 


With help of In-Degree graph we calculate the number of 


followers for each person. 
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Fig In-Degree 
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90% of the people are having only 12 followers 
95% of the people are having only 18 followers Probability density function is represented in the 
99% of the people are having only 40 followers Erie bae Bran ners Mouan 
99.9% of the people are having only 112 followers 
100% of the people are having only 552 followers 
4). OUT-DEGREE GRAPH 
3).PDF of In-Degree Graph 
The below Out-Degree graph shows that number of 
The below probability density function graph shows that persons that each person is following. 
very less number of people will be getting followed more 
number of followers. 


. Fig Out-D 
Fig PDF of Indegree 15 Ue 


90% of the people are following only 12 persons 
95% of the people are following only 19 persons 
99% of the people are following only 40 persons 
99.9% of the people are following only 123 persons 
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100% of the people are following only 1566 persons Fig In-Degree and Out-degree 


5). PDF of Out-Degree Graph 


The below probability density function shows that very 


less number of people will follows the more number of 


people m 
Fig PDF of Out-Degree d 4 250000 500000 730000 1000000 1250000 1500000 1 000 
90% of the people are having only 24 persons 
= 95% of the people are having only 37 persons 
| 
99% of the people are having only 79 persons 
9028 - 99.9% of the people are having only 221 persons 


100% of the people are having only 1579 persons 


7) Depth and Estimator Graph 


$ zo wo a zo uo , ae 
0.944 
6). In-Degree — Out-Degree Combined Graph Y 0924 
$ 
The below In-degree and Out-Degree graph shows that 0.90 4 
E A a a A ay 
0 20 40 60 80 100 120 


number of people that each person is following and the 


number of followers that each person is having. 0.88] 


Depth 
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0.88 4 
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Fig Graphs after calculating F1 Score 


8). Confusion Matrix Graph 
Fig Train Confusion Matrix 


Train confusion matrix 


Confusion matrix Precision matri 


Fig Test Confusion Matrix. 


Recall matrix 


Predicted Class 
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on mat Prac max 


From the above confusion matrix we infer that the 
accuracy score of the train matrix is 95% and accuracy 


score of test matrix is 89%. 


9). Feature Extraction Graph 


The below graph shows the differences and importance of 
the feature extractions that are calculated and obtained in 
a form of the bar graph 


Fig Feature Extractions 


Feature Importances 


From the graph we can say that follows_back feature is 
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most important extraction compared to other features and 
SVD(Singular Value Decomposition) is the least preferred 


one. 


V. CONCLUSION AND FUTURE WORK 

The proposed system develops a friend recommendation 
system to suggest friends to the users using Random Forest 
Classifier. With the help of the proposed system this 
recommendation model is working with the accuracy of 
89%. This is quite reasonable for the hardware and the 
volume of data that we have. Our data set consists of 94 
lakhs nodes. Performance metrics for this model is 
obtained by calculating Precision, Recall and F1 score.The 
most important feature extraction that we calculated is 
follows_back. feature. For the better results and accuracy 
Preferential attachment, SVM classifier and Graph neural 
networks can be used. This can improve the performance 


of the model in future. 
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